Model Selection

High-precision speech transcription

# High-precision speech transcription

Whisper Medium Oswald

Haitian Creole speech recognition model fine-tuned based on OpenAI Whisper-medium, focusing on high-accuracy transcription

Speech Recognition

Transformers Other

Distil Whisper Large V3 Ptbr

This is a fine-tuned version of distil-whisper-large-v3, specifically designed for automatic speech recognition (ASR) of Brazilian Portuguese, trained by combining the Common Voice 16 dataset and a private dataset.

Speech Recognition

Rev's Reverb ASR model is trained on 200,000 hours of professionally transcribed English speech data, making it one of the most accurate open-source automatic speech recognition systems for English.

Speech Recognition English

Whisper Medium Pt

Portuguese-optimized Whisper Medium speech recognition model achieving 6.579 word error rate (WER) on Common Voice 11 dataset

Speech Recognition

Transformers Other

Exp W2v2t It Wavlm S895

An Italian automatic speech recognition model fine-tuned based on microsoft/wavlm-large, trained using the Common Voice 7.0 Italian dataset.

Speech Recognition

Transformers Other

Ai Light Dance Singing2 Ft Wav2vec2 Large Xlsr 53 5gram V3

An automatic speech recognition model fine-tuned based on wav2vec2-large-xlsr-53, specializing in singing voice recognition

Speech Recognition

Ai Light Dance Singing2 Ft Wav2vec2 Large Xlsr 53 5gram V4 1

This model is an automatic speech recognition (ASR) model based on the wav2vec2-large-xlsr-53 architecture, fine-tuned on the GARY109/AI_LIGHT_DANCE - ONSET-SINGING2 dataset, primarily used for singing voice recognition tasks.

Speech Recognition

Ai Light Dance Stepmania Ft Wav2vec2 Large Xlsr 53 V5

Automatic speech recognition model based on wav2vec2-large-xlsr-53, fine-tuned on the GARY109/AI_LIGHT_DANCE dataset

Speech Recognition

Ai Light Dance Singing Ft Pretrain Wav2vec2 Large Lv60

This model is an automatic speech recognition (ASR) model based on the wav2vec2-large-lv60 architecture, fine-tuned on the GARY109/AI_LIGHT_DANCE - ONSET-SINGING dataset, primarily used for singing voice recognition tasks.

Speech Recognition

Wav2vec2 2 Bart Large No Adapter

This model is an automatic speech recognition (ASR) model trained on the LibriSpeech ASR dataset, capable of converting English speech into text.

Speech Recognition

Wav2vec2 2 Bert Large No Adapter

An automatic speech recognition (ASR) model trained on the LibriSpeech dataset for converting English speech to text

Speech Recognition

Wav2vec2 2 Bert Large No Adapter Frozen Enc

This model is a speech recognition model trained on the librispeech_asr dataset, achieving a word error rate (WER) of 2.0133 on the evaluation set.

Speech Recognition

Wav2vec2 Large Xlsr 53 French

This is an automatic speech recognition (ASR) model based on the wav2vec2 architecture, specifically fine-tuned for French, achieving a word error rate (WER) of 12.82% on the Common Voice French test set.

Speech Recognition

Transformers French

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase